Conversation
Merging this PR will degrade performance by 24.45%
Performance Changes
Comparing Footnotes
|
c3d7434 to
6c1d7aa
Compare
860cc64 to
042394b
Compare
df8dbf5 to
862d814
Compare
|
@claude review please |
This comment was marked as resolved.
This comment was marked as resolved.
|
@claude i have addressed your main comments. please give another review. |
This comment was marked as resolved.
This comment was marked as resolved.
daf258f to
578d83d
Compare
|
@claude please review |
This comment was marked as resolved.
This comment was marked as resolved.
|
@claude PR is ready for another review |
|
Claude finished @a10y's task in 8m 28s —— View job PR Review: PatchedArray basics and wiringTasks:
Overall this is solid work — the design is clean, the test coverage is good, and the key operations (slice, filter, take, compare, scalar_at) are all well-structured. I found three correctness bugs of varying severity. Details below. 🔴 Bug 1:
|
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
We call append_to_builder on the inner first, then just do a single pass and overwrite what it just wrote. Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
8648244 to
a9ce27f
Compare
| pub fn reinterpret<V: Sized>(&self) -> &[V] { | ||
| assert!( | ||
| self.is_aligned(Alignment::of::<V>()), | ||
| "ByteBuffer not properly aligned to {}", | ||
| type_name::<V>() | ||
| ); | ||
|
|
||
| assert_eq!( | ||
| self.length % size_of::<V>(), | ||
| 0, | ||
| "ByteBuffer length not a multiple of the value length" | ||
| ); | ||
|
|
||
| let v_len = self.length / size_of::<V>(); | ||
| let v_ptr = self.bytes.as_ptr().cast::<V>(); | ||
|
|
||
| // SAFETY: we checked that alignment and length are suitable to treat this as a &[V]. | ||
| unsafe { std::slice::from_raw_parts(v_ptr, v_len) } | ||
| } | ||
| } |
There was a problem hiding this comment.
did we never do this before?
There was a problem hiding this comment.
I coudln't find anything no
There was a problem hiding this comment.
shall we add a few more of these with different types? e.g. string?
There was a problem hiding this comment.
Executing anything other than primitive would fail
| pub(super) len: usize, | ||
|
|
||
| /// lane offsets. The PType of these MUST be u32 | ||
| pub(super) lane_offsets: BufferHandle, |
There was a problem hiding this comment.
are you sure we don't want to have the option to compress these?
| /// Number of 1024-element chunks. Pre-computed for convenience. | ||
| pub(super) n_chunks: usize, |
There was a problem hiding this comment.
this will increase the stack size and maybe register pressure do avoid / 1024?
| /// Number of lanes the patch indices and values have been split into. Each of the `n_chunks` | ||
| /// of 1024 values is split into `n_lanes` lanes horizontally, each lane having 1024 / n_lanes | ||
| /// values that might be patched. | ||
| pub(super) n_lanes: usize, |
There was a problem hiding this comment.
This I think we need to store incase we change the default ptype -> lane mapping in the future
There was a problem hiding this comment.
I think this kind of change would require new arrays. The data layout would change
| } | ||
|
|
||
| #[derive(Clone, prost::Message)] | ||
| pub struct PatchedMetadata { |
There was a problem hiding this comment.
I wonder if you can pack this into a u128?
There was a problem hiding this comment.
hm, is there a reason to do that over struct? can protobuf even encode u128?
There was a problem hiding this comment.
I mean using less and smaller fields
| /// in a filter or slice it may be sliced to the nearest chunk boundary. | ||
| #[prost(uint64, tag = "1")] | ||
| pub(crate) inner_len: u64, | ||
|
|
There was a problem hiding this comment.
This is used only to fetch
There was a problem hiding this comment.
what do you mean by this
There was a problem hiding this comment.
I am asking why we store this, and its likely cause the inner may have a different len to the result patch array?
| array.indices = children.remove(0); | ||
| array.values = children.remove(0); | ||
|
|
||
| Ok(()) |
There was a problem hiding this comment.
on debug check the dtypes?
Summary
This is the first PR in a series addressing the PatchedArray RFC: vortex-data/rfcs#27
This PR adds a new
PatchedArrayarray variant, which is slated to only be used with FastLanes array types. The design is largely documented in the RFC, but brieflyWe are able to pushdown the following at reduce time:
There will be follow ups to add the wiring into CUDA and to update how BitPacked and ALP arrays are written.
Testing
There are unit tests for all of the reducers and kernels